Time Hopping technique for faster reinforcement learning in simulations
نویسندگان
چکیده
A technique called Time Hopping is proposed for speeding up reinforcement learning algorithms. It is applicable to continuous optimization problems running in computer simulations. Making shortcuts in time by hopping between distant states combined with off-policy reinforcement learning allows the technique to maintain higher learning rate. Experiments on a simulated biped crawling robot confirm that Time Hopping can accelerate the learning process more than seven times.
منابع مشابه
Eligibility Propagation to Speed up Time Hopping for RL Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning
A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph....
متن کاملEligibility Propagation to Speed up Time Hopping for Reinforcement Learning
A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph....
متن کاملTime Hopping Technique for Reinforcement Learning and its Application to Robot Control
To speed up the convergence of reinforcement learning (RL) algorithms by more efficient use of computer simulations, three algorithmic techniques are proposed: Time Manipulation, Time Hopping, and Eligibility Propagation. They are evaluated on various robot control tasks. The proposed Time Manipulation [1] is a concept of manipulating the time inside a simulation and using it as a tool to speed...
متن کاملProbability Redistribution using Time Hopping for Reinforcement Learning
A method for using the Time Hopping technique as a tool for probability redistribution is proposed. Applied to reinforcement learning in a simulation, it is able to re-shape the state probability distribution of the underlying Markov decision process as desired. This is achieved by modifying the target selection strategy of Time Hopping appropriately. Experiments with a robot maze reinforcement...
متن کاملDecomposition of Reinforcement Learning for Admission Control of Self-Similar Call Arrival Processes
This paper presents predictive gain scheduling, a technique for simplifying reinforcement learning problems by decomposition. Link admission control of self-similar call traffic is used to demonstrate the technique. The control problem is decomposed into on-line prediction of near-future call arrival rates, and precomputation of policies for Poisson call arrival processes. At decision time, the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0904.0545 شماره
صفحات -
تاریخ انتشار 2009